Method and system for more efficiently utilizing processors of a graphics system

ABSTRACT

A method and system for utilizing processor(s) and bypass processor(s) of a computer graphics system are disclosed. The processor(s) and bypass processor(s) render primitives, which are ordered based on their left corners. The method and system include providing a merge circuit, a distributor, a feedback circuit and a controller. The merge circuit determines left and right edges for each primitive. The distributor is coupled with feedback circuit and outputs a first portion of the primitives. The distributor provides a second portion of the primitives to the processor(s) and a third portion of the primitives to the bypass processor(s) if the first portion includes more primitives than there are processor(s). The second portion includes no more primitives than there are processor(s). The feedback circuit, coupled to the merge circuit, re-inputs a fourth portion of the primitives to the bypass processor(s) until the first portion has been rendered for a line.

CROSS-REFERENCE TO RELATED APPLICATIONS

The present application is related to co-pending U.S. patent applicationSer. No. 09/978,476 entitled “Method and System for Efficiently LoadingPrimitives into Processors of a Graphics System,” filed on Oct. 16, 2001and assigned to the assignee of the present application. The presentapplication is also related to co-pending U.S. patent application Ser.No. 09/583,063 entitled “Method and System for Providing Hardware Sortin a Graphics System,” filed on May 30, 2000 and assigned to theassignee of the present application.

FIELD OF THE INVENTION

The present invention relates to computer graphics system, and moreparticularly to a method and system for more efficiently usingprocessors of a computer graphics system for processing data forprimitives.

BACKGROUND OF THE INVENTION

A conventional computer graphics system can display graphical images ofobjects on a display. The display includes a plurality of displayelements, known as pixels, typically arranged in a grid. In order todisplay objects, the conventional computer graphics system typicallybreaks each object into a plurality of polygons, termed primitives. Aconventional system then renders the primitives in a particular order.

Some computer graphics systems are capable of rendering the primitivesin raster order. Such as system is described in U.S. Pat. No. 5,963,210,entitled “Graphics Processor, System and Method for Generating ScreenPixels in Raster Order Utilizing a Single Interpolator” and assigned tothe assignee of the present application. In such a system, all of theprimitives intersecting a particular pixel are rendered for that pixel.The primitives intersecting a next pixel in the line are then rendered.Typically, this process proceeds from left to right in the line untilthe line has been rendered, then recommences on the next line. The frameis rendered line by line, until the frame has been completed.

In order to render the frame, the primitives are loaded into processors.Typically, all of the primitives starting at a particular line areloaded into the processors at the start of the line. After the line hascompleted processing, primitives which have expired are ejected. Anexpired primitive is one which can not be present on the next line. Inother words, an expired primitive has a bottom that is no lower than theline that was just processed. Any new primitives for the next line areloaded at the start of the next line. The line is then processed asdescribed above. This procedure continues until the frame is rendered.

Although the system and method function well for there intended purposeand can render primitives in raster order, one of ordinary skill in theart will readily recognize that the system and method have limitations.In particular, the complexity of the frame being rendered is limited bythe number of processors available. As described above, all of theprimitives for line are provided to processors at the start of a lineand ejected at the end of a line. The total number of primitives thatcan be provided to the processors is limited by the number ofprocessors. Thus, the total number of primitives that can be renderedfor a particular line is limited by the number of processors in thesystem. For similar reasons, the total number of primitives that canoverlap at a particular pixel is also limited by the number ofprocessors in the system. Typically, the number of processors is on theorder of sixteen or thirty-two. As a result, the number of primitivesthat overlap at a particular pixel and that can be processed for a lineis limited to sixteen or thirty-two. The complexity of the frame isthereby limited. This limitation can be improved by increasing thenumber of processors. However, increasing the number of processorsincreases the space consumed by the graphics system, which isundesirable.

Furthermore, the processes of loading primitives and ejecting expiredprimitives each consume time and resources. In addition, in a complexscene, many primitives might expire at the end of a particular line anda large number of primitives might start at the next line. Ejecting theexpired primitives and loading the new primitives might cause asignificant delay in the pipeline.

Accordingly, what is needed is a system and method for more efficientlyutilizing the processors of a computer graphics system. The presentinvention addresses such a need.

SUMMARY OF THE INVENTION

The present invention provides a method and system for more efficientlyutilizing at least one processor and at least one bypass processor of acomputer graphics system. The processor(s) include a particular numberof processors. In addition, the term bypass processor includes one ormore bypass processors. The processor(s) and at least one bypassprocessor render a plurality of primitives. Each primitive has a leftcorner, a right corner and a top. The primitives are ordered based onthe left corner of each of the plurality of primitives. The method andsystem include providing a merge circuit, a distributor and a feedbackcircuit. The merge circuit determines a left edge and a right edge foreach of the plurality of primitives. The distributor is coupled with thefeedback circuit and outputs a first portion of the plurality ofprimitives. The distributor provides a second portion of the pluralityof primitives to the processor(s) and provides a third portion of theplurality of primitives to the at least one bypass processor if thefirst portion of the plurality of primitives includes more primitivesthan the particular number of processors. The second portion of theplurality of primitives includes a number of primitives that is notgreater than the particular number of processors. The feedback circuit,which is coupled to the merge circuit and the distributor, re-inputs afourth portion of the plurality of primitives to the at least one bypassprocessor until the first portion of the plurality of primitives hasbeen rendered for a particular line. The controller controls thefeedback circuit, the distributor and the merge circuit.

According to the system and method disclosed herein, the presentinvention provides a more efficient mechanism for utilizing theprocessors.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 is a block diagram of a computer system including a graphicssystem.

FIG. 2 is a diagram of a portion of a display including a plurality ofprimitives rendered for a frame.

FIG. 3 is a flow chart a method for rendering primitives.

FIG. 4 is a block diagram of a preferred embodiment of a computergraphics system using one embodiment of a system in accordance with thepresent invention.

FIG. 5 is a block diagram of one embodiment of a system in accordancewith the present invention for more efficiently loading primitives intoprocessors in a computer graphics system.

FIG. 6 is block diagram of one embodiment of a system in accordance withthe present invention for more efficiently utilizing processors in acomputer graphics system.

FIG. 7 is a high-level flow chart of one embodiment of a method inaccordance with the present invention for more efficiently usingprocessors in a computer graphics system.

FIGS. 8A and 8B depict a more detailed flow chart of one embodiment of amethod in accordance with the present invention for more efficientlyusing processors in a computer graphics system.

DETAILED DESCRIPTION OF THE INVENTION

The present invention relates to an improvement in computer graphicssystem. The following description is presented to enable one of ordinaryskill in the art to make and use the invention and is provided in thecontext of a patent application and its requirements. Variousmodifications to the preferred embodiment will be readily apparent tothose skilled in the art and the generic principles herein may beapplied to other embodiments. Thus, the present invention is notintended to be limited to the embodiment shown, but is to be accordedthe widest scope consistent with the principles and features describedherein.

FIG. 1 is a block diagram of a computer system 10 including a computergraphics system 20. The computer system 10 also includes a centralprocessing unit 12, a display 14, a user interface 16 such as a keyboardand/or mouse and a memory 18. The graphics system 20 is depicted asincluding an internal memory 22 and processors 24 that are coupled by abus 23. The graphics system 20 typically has other components that arenot shown for clarity.

FIG. 2 depicts a portion of the display 14. The display 14 includes aplurality of pixels. For clarity, only one pixel. 15 is depicted. On thedisplay are depicted primitives 30, 40 and 50. The primitives 30, 40 and50 are typically part of a scene containing many primitives. Theprimitives in the scene may also overlap, as is shown for primitives 30and 40.

Referring to FIGS. 1 and 2, in order to render a scene on the display14, the graphics system 20 must render the polygons. In a graphicssystem 20 described in U.S. Pat. No. ______, entitled “GRAPHICSPROCESSOR, SYSTEM AND METHOD FOR GENERATING SCREEN PIXELS IN RASTERORDER UTILIZING A SINGLE INTERPOLATOR” and assigned to the assignee ofthe present application, the graphics system 20 renders the primitives30, 40 and 50 in raster order. In other words, the graphics system 20renders a scene pixel by pixel in raster order. Thus, in the area whereprimitives 30 and 40 overlap, two primitives are rendered for eachpixel. In order to render the scene, data for the primitives 30, 40 and50 must be loaded from the internal memory 22 to the processors 24.

FIG. 3 depicts a high level flow chart of a method 60 for renderingprimitives in a scene used in the above-mentioned U.S. patent. At thestart of the line, new primitives for the line are loaded into theprocessors 24, via step 62. The primitives are loaded from the internalmemory 22 to the processors 24. Thus, primitives which commenced at aprevious line and which will contribute to the current line remain inthe processors 24. The line is then processed, via step 64. Step 64 mayinclude performing interpolation, texture processing, antialiasing orother operations used in rendering the scene. It is determined whetherprocessing of the line is complete, via step 66. If not, then processingcontinues in step 64. If the line is completed, then the primitives thathave expired are evicted from some or all of the processors 24, via step68. A primitive that has expired cannot contribute to the next line andthus has a bottom that is no lower than the current line beingprocessed. The new line is then commenced, via step 70. Any newprimitives are then loaded, via step 62. The method 60 thus repeatsuntil the frame has been rendered.

Although the method and system shown in FIGS. 1 and 3 function well fortheir intended purpose, one of ordinary skill in the art will readilyrealize that there are limitations. The number of primitives that can beprocessed for a particular pixel in a line and the number of primitivesthat can be processed for the entire line are limited by the number ofprimitives that can be loaded into the processor 24. This number is thesame as the number of processors 24, which is typically sixteen orthirty-two. Thus, the complexity of the scene that can be rendered islimited. Although increasing the number of processors 24 addresses thisproblem, the space consumed by the graphics system 20 will alsoincreased. Such an increase in space is undesirable. In addition,loading primitives in the processors 24 in step 62 requires time.Similarly, evicting primitives from the processor 24 in step 68 requirestime. If a certain line differs significantly from a previous line, thenumber of primitives evicted and loaded may be quite large. This isparticularly true if the bus 23 does not have sufficient throughput. Asa result, the time required to perform steps 62 and 68 becomessignificant, delaying completion of the frame by the graphics system 20.

The present invention provides a method and system for utilizing atleast one processor and at least one bypass processor of a computergraphics system. The processor(s) include a particular number ofprocessors. The processor(s) and at least one bypass processor render aplurality of primitives. Each primitive has a left corner, a rightcorner and a top. The primitives are ordered based on the left corner ofeach of the plurality of primitives. The method and system includeproviding a merge circuit, a distributor and a feedback circuit. Themerge circuit determines a left edge and a right edge for each of theplurality of primitives per line. The distributor is coupled with thefeedback circuit and outputs a first portion of the plurality ofprimitives. The distributor provides a second portion of the pluralityof primitives to the processor(s) and provides a third portion of theplurality of primitives to the at least one bypass processor if thefirst portion of the plurality of primitives includes more primitivesthan the particular number of processors. The second portion of theplurality of primitives includes a number of primitives that is notgreater than the particular number of processors. The feedback circuit,which is coupled to the merge circuit and the distributor, re-inputs afourth portion of the plurality of primitives to the at least one bypassprocessor until the first portion of the plurality of primitives hasbeen rendered for a particular line. The controller controls thefeedback circuit, the distributor and the merge circuit.

The present invention will be described in terms of a particularcomputer system, a particular computer graphics system, a particular setof components and a particular set of processors. However, one ofordinary skill in the art will readily recognize that this method andsystem will operate effectively for other computer system, othercomputer graphics systems, other and/or additional components and othernumbers of processors. For example, the present invention is describedusing a single bypass processor and a single distributor. One ofordinary skill in the art will, however, recognize that the presentinvention is consistent with the user of multiple bypass processorsand/or multiple distributors. Moreover, the present invention will bedescribed as providing primitives and/or portions of primitives tocomponents, such as processors. However, one of ordinary skill in theart will readily recognize that in a preferred embodiment, some portionof the data for the primitives are actually provided to the components.The present invention will also be described in the context of a systemwhich feeds back the primitives which have not been evicted, using ay-loop circuit, and sorts the primitives using a hardware sorter.However, one of ordinary skill in the art will readily realize thatpresent invention is consistent with a system that does not utilize ay-loop circuit or a hardware sorter

To more particularly illustrate the method and system in accordance withthe present invention, refer now to FIG. 4, depicting one embodiment ofa computer graphics system 100 using one embodiment of a system inaccordance with the present invention. The computer graphics system 100is preferably used in the computer system 10 in place of the computergraphics system 20. The computer graphics system 100 includes a system140 in accordance with the present invention for more efficiently usingprocessors in the computer graphics system 100. The system 140 is termedherein an x-loop system 140. The computer graphics system 100 alsoincludes an internal memory 110, a processor block 120, additionalprocessing circuitry 122, y-loop circuitry 130, and a sorter 125. Theprocessor block 120 includes processors 121 and a bypass processor 123.In a preferred embodiment, there are sixteen processors 121. Theadditional processing circuitry 122 could include one or moreinterpolators, sorters, antialiasing units and other circuitry actuallyused in rendering the frame. Some embodiments of the additionalprocessing circuitry 122 are described in the above-mentioned U.S.patent. The internal memory 110 is preferably a random access memory(“RAM”) 110. Data for the primitives are preferably loaded into the RAM110. This data preferably includes an identifier for each primitive, thetop and bottom coordinates for each primitive and can include texture,color, or other data used in processing the primitive. The y-loopcircuitry 130 and sorter 125 are provided in a preferred embodiment.However, the system 140 in accordance with the present invention couldbe used within a system not using the y-loop circuitry 130 and thesorter 125.

FIG. 5 is a block diagram of one embodiment of the y-loop circuitry 130that more efficiently loads primitives into processors in a computergraphics system 100. The y-loop circuitry 130 includes a y-loop mergecircuit 131, a y-loop distributor 132, a y-loop feedback circuit 134 anda y-loop controller 133. The y-loop circuitry 130 preferably receivesprimitives that are ordered based upon the tops of the primitives in theinputs 135. The primitives are preferably triangles. However, nothingprevents the use of primitives having other shapes. The primitives areprovided to the y-loop merge circuit 131. The y-loop merge circuit 131adds new primitives received from the inputs 135 that have tops that arenot lower than a current line. The new primitives are merged with thoseprimitives received from the y-loop feedback circuit 134, discussedbelow. The new primitives and fed back primitives are provided from they-loop merge circuit 131 to the y-loop distributor 132. The y-loopdistributor 132 eliminates expired primitives. An expired primitive hasa bottom that is higher than the current line and which, therefore,should not contribute to the current line or subsequent lines. They-loop distributor 132 then outputs primitives for a current line. Theprimitives are provided to the sorter 125, depicted in FIG. 4, to beprovided to the processor block 120 through the x-loop circuitry 140.The y-loop distributor 132 also provides the primitives to the y-loopfeedback circuit 134. The y-loop feedback circuit 134 re-inputs theprimitives output by the y-loop distributor 132 to the y-loop mergecircuit 131. The y-loop controller 133 controls the y-loop feedbackcircuit 134, the y-loop distributor 132 and the y-loop merge circuit131.

The y-loop circuitry 130 allows primitives to be more efficiently inputto the processor block 120. In particular, the y-loop circuitry 130results in primitives being continuously loaded and evicted by thesystem 100. As a result, any delays at the end of a line due to ejectingand loading of primitives can be reduced or eliminated. Thus, loading ofprimitives to the processor block 120, via the sorter 125 and the x-loopcircuitry 140, in the graphics system 100 can be made more efficient.Furthermore, the y-loop feedback circuitry 134, which is preferably ay-loop FIFO 134, can hold data for a large number of primitives,preferably 1024 primitives. In effect, the FIFO 134 allows the y-loopcircuitry 130 to provide data to a large number of processors.Therefore, the y-loop 130 can be used with a large number of virtual (oractual) processors. In a preferred embodiment, the number of actual,physical processors in the processor block 120 is different from thenumber of virtual processors. The number of virtual processors is set bythe number of processors the graphics system 100 appears to have becauseof the configuration and functions of the processors actually used. In apreferred embodiment, the graphics system 100 has one thousand andtwenty-four virtual processors. The number of actual processors 121 and123 in the processor block 120 is substantially less. In a preferredembodiment, there are sixteen processors 121 and one bypass processor123 in the processor block 120. However, nothing prevents the use ofanother number of processors and/or another number of virtualprocessors.

The sorter 125 is preferably a hardware sorter, as described inco-pending U.S. patent application Ser. No. 09/583,063 entitled “Methodand System for Providing Hardware Sort in a Graphics System,” filed onMay 30, 2000 and assigned to the assignee of the present application.Applicant hereby incorporates by reference the above-mentionedco-pending patent application. However, in an alternate embodiment, thesorter 125 could be formed in another fashion. The sorter 125 sorts theprimitives from the y-loop circuitry 130. The sorter 125 sorts theprimitives based upon the left edge of the primitives. In a preferredembodiment, the sorter 125 sorts the primitives based upon the leftcorner of each primitive. The sorter 125 passes the primitives, sortedbased upon their left corners, to the x-loop circuitry 140.

FIG. 6 is block diagram of one embodiment of x-loop circuitry. 140 inaccordance with the present invention for more efficiently utilizingprocessors in a computer graphics system 100. The x-loop circuitry 140includes a merge circuit 150, a distributor 160, a controller 170 and afeedback circuit 180. The merge circuit 150 receives the primitives fromthe sorter 125, via inputs 142, 144 and 146. The merge circuit 150preferably receives the index, primitive type and left corner of theprimitive via the inputs 142, 144 and 146.

The controller 170 includes a pixel counter 172, a feedback circuitcontrol block 174, a flush line block 176, and an output 178 for thecurrent position. The pixel counter 172 determines the current positionin the line. In other words, the pixel counter 172 determines thecurrent x value. The current position is provided to the merge circuit150 and the distributor 160 via the output 178. The control block 174and the flush line block 176 control the feedback circuit 180.

The merge circuit 150 includes a merge block 152, a compare block 154and a calculator 156. The merge circuit 150 receives new primitiveshaving a furthest left vertex that is left of the current position. Themerge circuit 150 determines the primitives to be received by comparingthe left vertex (Xleft) with the current position using the compareblock 154. The merge circuit 150 also receives any primitive s that arebeing fed back through the x-loop circuit 140 via the feedback circuit180. The merge circuit 150 also calculates the left and right edges ofthe primitive for the current line as well as the span of the primitive.The left edge is the farthest left point for the primitive on thecurrent line. The right edge is the farthest rightpoint for theprimitive on the current line. The span is the difference between theleft and right edges and determines whether the primitive is valid. Theprimitive is valid if the left edge is not farther right than the rightedge.

If no antialiasing is performed, then the span is preferably determinedas follows. The left and right edges are determined by calculating thecurrent line (the y coordinate of the current line) and defining whichsides of the primitive intersect the current line. Interpolation is thenused to calculate the x coordinates at which the sides intersect thecurrent line. The x-coordinates are then preferably rounded off to thenearest pixel to determine the left and right edges. The span is validif the x coordinate for the left edge is less than the x coordinate forthe right edge.

If antialiasing is performed, then the determination of the span is moredifficult because antialiasing generally considers each pixel to be madeup of a number of subpixels. In a preferred embodiment, each subpixel isconsidered to be a separate entity. Thus, in a preferred embodiment theleft and right edges and the span are determined as follows whenantialiasing is performed. Two lines are determined, a first line forthe first row of subpixels and a last line for the last row of subpixelsfor the current line. If the center of a subpixels is inside or on theborder of the primitive, the subpixel is considered to be part of theprimitive. If the centers of all of subpixels within a pixel are part ofthe primitive, the pixel is considered whole (all part of theprimitive). Otherwise, the pixel is considered to be partial.

The sides which intersect the first and last line of subpixels aredetermined. Interpolation is used to calculate the coordinates at whichthe sides intersect the first and last line of subpixels. Theintersection of the left side with the first line and last line ofsubpixels are termed XL1 and XLF. Similarly, the intersection of theright side with the first line and last line of subpixels are termed XR1and XRF. If only two sides of the primitive intersect the first and lastline of subpixels for the current line, then the left edge is thetruncated value of the minimum (farthest left) of XL1 and XLF, plus1/(number of rows of subpixels). The right edge is the truncated valueof the maximum (farthest right) of XR1 and XRF, plus (number of rows ofpixels −1)/(number of rows of subpixels). In the preferred embodiment,another left variable is defined by the truncated value of the maximumof XL1 and XLF, plus (number of rows of pixels −1)/(number of rows ofsubpixels). Another right variable is defined by the truncated value ofthe minimum of XR1 and XRF, plus 1/(number of rows of subpixels). If theminimum of XL1 and XLF is less than or equal to the maximum of XR1 andXRF, then the span is valid.

In addition, several special cases are preferably considered incalculating the span when antialiasing is performed. In the specialcases that a top or bottom vertex of the primitive intersects thecurrent line, then the x coordinates for the edges (discussed above) aresubstituted with the top or bottom, respectively, of the vertex. If amiddle vertex (not the top or the bottom vertex of the primitive) forthe primitive is located on the current line and the vertex is thefarthest left (or right) portion of the primitive, then the intersectionof the top and bottom lines of pixels are set equal to the coordinatesof the vertex. If, however, the vertex is not the farthest left or rightpoint in the primitive, then two sides will contribute to thecalculation of the edge. In addition, more than one vertex of thevertices may reside in the same pixel. As a result, the pixels in thespan are all considered to be partial and the span is calculated asdescribed above.

As described above, interpolation is used to determine the intersectionsof a side with the current line. The intersection of a side with thecurrent line, termed Xc, is preferably given by:Xc=Xt+(Xb−Xt)Q=Xb(Q)+Xt(1−Q).where:

-   -   Q=(Yc−Yt)/(Yb−Yt)    -   Xt=X coordinate of the top of the side (at the vertex)    -   Xb=X coordinate of the bottom of the side (at the vertex)    -   Yt=Y coordinate of the top of the side (at the vertex)    -   Yb=Y coordinate of the bottom of the side (at the vertex)

In a preferred embodiment, the resultant of the interpolation is roundedto the nearest sub-pixel. In a preferred embodiment, if there is anerror introduced due to rounding, up to a full a pixel could be added toor subtracted from the rounded resultant of the interpolation. In apreferred embodiment, the filling of the primitive is determined basedupon the intersection of the primitive with the current line. Inaddition, in a preferred embodiment, an indication of whether the resultof the interpolation is rounded is provided.

Thus, in a preferred embodiment, the merge circuit 150 determines theleft and right edges, as well as the span of each primitive at thecurrent line. However, in an alternate embodiment, these characteristicsof the primitives could be determined by another component. In addition,although the preferred method for determining the span is describedabove, nothing prevents the span from being determined in anotherfashion.

The merge circuit 150 thus preferably calculates the left edge, theright edge and the span. The merge block 152 also determines whichprimitives are valid based upon the span. The merge circuit 150 passesprimitives which are valid to the distributor 160. In a preferredembodiment the index and type of the primitive are provided to thedistributor, in addition to the left edge, the right edge and theadditional left and right variables using outputs 158. The index of theprimitive identifies the primitive. The primitive type indicates whetherthe top or bottom edge is to the left. For a triangle, the bottom edgeis to the left when the middle vertex is on the right side of the topand bottom vertices. Similarly, the bottom edge is to the right when themiddle vertex is to the left of the top and bottom vertices.

The distributor 160 includes a compare block 162 and a distribute block164. The distributor 160 compares the current x position with the rightedges of the primitives using the compare block 162. If the right edgeof the primitive is less than or equal to the current x position (i.e.the entire primitive is to the left of the current position), then thedistributor discards the primitive. Thus, primitives which will nolonger contribute to the current line are discarded. As a result,fragments for these primitives will not be generated. The distributorprovides primitives to the processor block 120 via outputs 166 and 168.The outputs 166 provide the primitives from the distributor 160 to theprocessors 121. However, the number of processors 121 is limited. As aresult, the distributor 166 will provide as many primitives for aparticular line as the processors 121 can receive. Thus, if there aresixteen processors 121, then the distributor 160 provides up to sixteenprimitives to the processors 121 for a given pixel. If there areadditional primitives for the pixel, the remaining primitives areprovided to the bypass processors 123 through the outputs 168 and thecontroller 170. For example, if there are twenty primitives on aparticular line, the distributor 160 will provide sixteen primitives tothe sixteen processors 121 and provide the remaining four primitives tothe bypass processor 123. The remaining four primitives are thenprocessed, one by one, using the bypass processor 123.

Some of the outputs 168 from the distributor 160 are provided to thefeedback circuit 180. The feedback circuit 180 is preferably a FIFO.Consequently, the primitive(s) (as embodied in the index and theprimitive type) are fed back through the FIFO circuitry 180 and providedto, the merge circuit 150. When a new pixel is to be processed, any newprimitives are received by the merge circuit 150. The fed backprimitive(s) are merged with any new primitives using the merge block152. The new primitives and the fed back primitive(s) are then againprovided to the distributor 160 through the controller 170. As long asthey have not expired, the fed back primitive(s) are again passed to thebypass processor 123. A portion of the new primitives is provided to theprocessors 121 if any processors 121 have become vacant. If theprocessors 121 are still full, then some portion of the new primitivesare provided to the bypass processor 123, along with the fed backprimitives. Any primitives (both fed back and new) that are provided tothe bypass processor 121 are also provided to the feedback circuit 180unless the primitive is evicted from the bypass processor 121, asdescribed below.

Thus, the x-loop circuitry 140 loads the processors 121 with primitivesfor a current line and provides any primitives in excess of the numberof processors 121 to the bypass processor 123. The primitives providedto the bypass processor 123 are also fed back to the merge circuit 150through the feedback circuit 180. The primitives provided to the bypassprocessor 123 can thus be looped through the x-loop circuitry 150 andprovided to the bypass processor 123 for rendering subsequent pixelswhich the primitives intersect. Because of the use of the bypassprocessor 123 and the x-loop circuitry 140, the computer graphics system100 can render a frame in which more primitives than the number ofprocessors 121 intersect pixels of a single line. For the same reason,the computer system graphics 100 can handle situations in which moreprimitives than the number of processors intersect a single pixel. In apreferred embodiment, in which a single bypass processor 123 is used, itmay take longer to process the primitives provided to the bypassprocessor 123 because a single bypass processor 123 is used for allprimitives that are not provided to the processors 121. However, thesystem 100 is still capable of rendering more complex scenes.

In addition, in a preferred embodiment, the y-loop circuitry 130 is alsoused. Using the y-loop 130 and the method 200, primitives can becontinuously loaded and ejected. As a result, any delays at the end of aline due to ejecting and loading of primitives can be reduced oreliminated. Thus, loading of primitives to the processors 121 in thegraphics system 100 can be made more efficient. Furthermore, because thefeedback circuit 134 can hold data for a large number of primitives, they-loop 130 can be used with a large number of virtual (or actual)processors. This feature also allows more primitives to overlap a singlepixel.

FIG. 7 is a high-level flow chart of one embodiment of a method 200 inaccordance with the present invention for more efficiently usingprocessors in a computer graphics system. The method 200 preferably usesthe system 100. Consequently, the method 200 is described in the contextof the computer graphics system 100.

It is determined whether a new primitive for the current line should beprovided to the processors 121, via step 202. Step 202 preferablydetermines whether there are new primitives which commence to the leftof the current position. If there are no new primitives, then step 202is repeated. If there are new primitives, then the new primitives aremerged with any fed back primitives, via step 204. Any expiredprimitives are evicted, via step 206. If any of the processors 121 areavailable, then at least a portion of the new primitives are provided tothe available processors, via step 208. A processor is available if itis not in use for rendering an unexpired primitive for the currentposition. It is determined whether there are additional primitives, bothnew and fedback primitives, to be processed, via step 210. If not, theposition is incremented to the next pixel in the line, via step 214. Ifthere are additional primitives to be processed, the remainingprimitives are provided to the bypass processor and fed back, via step212. The position is then incremented in step 214 and step 202 returnedto.

Because of the use of the bypass processor 123 and because theprimitives provided the bypass processor 123 are fed back forprocessing, the method 200 can render a frame in which more primitivesthan the number of processors 121 intersect pixels of a single line. Forthe same reason, the method 200 can handle situations in which moreprimitives than the number of processors intersect a single pixel. In apreferred embodiment, in which a single bypass processor 123 is used, itmay take longer to process the primitives provided to the bypassprocessor 123 because a single bypass processor 123 is used for allprimitives that are not provided to the processors 121. However, themethod 200 is still capable of rendering more complex scenes.

FIGS. 8A and 8B depict a more detailed flow chart of one embodiment of amethod 250 in accordance with the present invention for more efficientlyusing processors in a computer graphics system. The method 250preferably uses the system 100. Consequently, the method 250 isdescribed in the context of the computer graphics system 100.

The current position in the line is determined, preferably using thepixel counter 172, via step 252. It is determined whether the feedbackFIFO 180 is empty, via step 254. If the feedback FIFO 180 is empty, thenit is determined whether any new primitives begin at the currentposition, via step 256. Thus, step 256 determines whether the right edgeof any primitive is at the current position. If it is determined in step256 that no new primitives commence at the current position, then thecurrent position in the line is incremented, via step 280. If it isdetermined in step 256 that a new primitive does start at the currentlocation, then the primitives are provided to the distribute block 160,via step 260. If it is determined in step 254 that the feedback FIFO 180is not empty, then the primitives are unloaded from the feedback FIFO180, via step 258.

After the primitives are unloaded in step 258 or after the primitivesare provided to the distribute block 160, then it is determined whetherthe processors 121 are full, via step 262. Thus, step 262 determineswhether there are any processors 121 available for the currentprimitives. If it is determined in step 262 that the processors are notfull, then the span and other variables such as the left and right edgesare calculated, via step 264. The primitive and the data calculated instep 264 are provided to the processor 121, via step 266. The primitiveis then evicted from the x-loop 140, via 268. The primitive can beevicted because it has been sent to one of the processors 121 and is nolonger needed in the x-loop 140.

If it is determined in step 262 that the processors 121 are full, thenthe span and other variables such as the left and right edges arecalculated via step 270. The primitive and the data calculated in step270 are loaded to the bypass processor 123, via step 272. It is thendetermined whether the next pixel would still be active for theprimitive, via step 274. Step 274 preferably includes comparing theright edge of the primitive to the next pixel. If the next pixel is notactive, then the primitive has completed processing for the current lineand is thus evicted from the x-loop 140, via step 276. Otherwise, theprimitive is loaded into the feedback FIFO 180, via step 278. The method250 then returns to determining the current position in the line in step252.

Using the method 250, the graphics system 100 can render a frame inwhich more primitives than the number of processors 121 intersect pixelsof a single line. For the same reason, the method 250 can handlesituations in which more primitives than the number of processorsintersect a single pixel. As a result, the graphics system 100 canrender a more complex scene while utilizing fewer processors and,therefore, less space in the computer system 100.

A method and system has been disclosed for more efficiently utilizingprocessors for a graphics system. Software written according to thepresent invention is to be stored in some form of computer-readablemedium, such as memory, CD-ROM or transmitted over a network andexecuted by a processor. Consequently, a computer-readable medium isintended to include a computer readable signal which, for example, maybe transmitted over a network. Although the present invention has beendescribed in accordance with the embodiments shown, one of ordinaryskill in the art will readily recognize that there could be variationsto the embodiments and those variations would be within the spirit andscope of the present invention. Accordingly, many modifications may bemade by one of ordinary skill in the art without departing from thespirit and scope of the appended claims.

1. A system for utilizing at least one processor and at least one bypassprocessor of a computer graphics system, the at least one processorincluding a particular number of processors, the at least one processorand the at least one bypass processor for rendering a plurality ofprimitives, each of the plurality of primitives having a left corner anda right corner, the plurality of primitives being ordered based on theleft corner of each of the plurality of primitives, the systemcomprising: a merge circuit for determining a left edge for each of theplurality of primitives and determining a right edge for each of theplurality of primitives; a distributor, coupled with the feedbackcircuit, for outputting a first portion of the plurality of primitives,the distributor providing a second portion of the plurality ofprimitives to the at least one processor and providing a third portionof the plurality of primitives to the at least one bypass processor ifthe first portion of the plurality of primitives includes moreprimitives than the particular number of processors, the second portionof the plurality of primitives including a number of primitives that isnot greater than the particular number of processors, a feedbackcircuit, coupled to the merge circuit and the distributor, forre-inputting a fourth portion of the plurality of primitives to the atleast one bypass processor until the first portion of the plurality ofprimitives has been rendered for a particular line; and a controller forcontrolling the feedback circuit, the distributor and the merge circuit.2. The system of claim 1 wherein the distributor further discards anexpired portion of the primitives, each of the expired portion of theprimitives having a right edge to the left of a current position.
 3. Thesystem of claim 1 wherein the merge circuit receives the fourth portionof the plurality of primitives and provides the fourth portion of theplurality of primitives to the distributor and wherein the distributorprovides a fifth portion of the plurality of primitives to the at leastone bypass processor.
 4. The system of claim 1 wherein the merge circuitfurther determines whether the left edge of the primitive is left of theright edge of the primitive and wherein each of the first portion andthe second portion of the plurality of primitives has a left edge thatis to the left of the right edge.
 5. The system of claim 1 wherein thefirst portion of the plurality of primitives resides on a single line ofa display.
 6. The system of claim 1 wherein the merge circuit furthercalculates a span for each of the plurality of primitives.
 7. The systemof claim 6 wherein the plurality of primitives are antialiased; andwherein the merge circuit further calculates the span using the leftside of the primitive, the right side of the primitive and whether acurrent pixel is completely covered or partially covered by theprimitive.
 8. The system of claim 1 wherein the feedback circuit furtherincludes a first in first out (“FIFO”) buffer.
 9. The system of claim 1further comprising: a sorter, coupled with the merge circuit, forsorting the plurality of primitives horizontally.
 10. The system ofclaim 9 wherein sorter sorts the plurality of primitives horizontallyfrom left to right, based upon the left edge of the primitive.
 11. Thesystem of claim 9 wherein the sorter is a hardware sorter.
 12. Thesystem of claim 9 further include y-loop circuitry for providing thefirst of the plurality of primitives for a current line to the sorter.13. The system of claim 12 wherein each of the plurality of primitiveshas a top and a bottom, wherein the plurality of primitives are sorterbased on the top of each of the plurality of primitives and wherein they-loop circuitry further includes: at least one input for receiving datarelating to each of the plurality of primitives; a second merge circuit,coupled with the input, for adding the data for a primitive having a topthat is not lower than a current line; a second distributor, coupledwith the second merge circuit, for eliminating an expired primitive andoutputting the data for a remaining portion of the primitives after theexpired primitive has been removed, the expired primitive having abottom that is above a current line; a second feedback circuit, coupledwith the second distributor and the second merge circuit, forre-inputting to the second merge circuit the data for the remainingportion of the plurality of primitives; and a second controller forcontrolling the second feedback circuit, the second distributor and thesecond merge circuit.
 14. A method for utilizing at least one processorand at least one bypass processor of a computer graphics system, the atleast one processor including a particular number of processors, the atleast one processor and the at least one bypass processor for renderinga plurality of primitives, each of the plurality of primitives having aleft corner and a right corner, the plurality of primitives beingordered based on the left corner of each of the plurality of primitives,the method comprising the steps of: (a) providing a first portion of theplurality of primitives to the at least one processor if the at leastone processor is not full; (b) providing a second portion of theplurality of primitives to the at least one bypass processor if the atleast one processor is full; and (c) re-inputting a fourth portion ofthe plurality of primitives to the at least one bypass processor untilthe first portion of the plurality of primitives has been rendered for aparticular line.
 15. The method of claim 14 further comprising the stepof: discarding an expired portion of the primitives prior to providingthe first and second portions of the primitives to the at least oneprocessor and the at least one bypass processor, each of the expiredportion of the primitives having a right edge to the left of a currentposition.
 16. The method of claim 14 further comprising the step of: (d)determining whether the left edge of the primitive is left of the rightedge of the primitive, wherein each of the first portion and the secondportion of the plurality of primitives has a left edge that is to theleft of the right edge.
 17. The method system of claim 14 wherein thefirst portion of the plurality of primitives resides on a single line ofa display.
 18. The method of claim 14 further comprising the step of:(d) calculating a span for each of the plurality of primitives.
 19. Themethod of claim 18 wherein the plurality of primitives are antialiasedand wherein the span calculating step (d) further includes the step of:(d1) calculating the span using the left side of the primitive, theright side of the primitive and whether a current pixel is completelycovered or partially covered by the primitive.
 20. The method of claim14 further comprising the step of: (d) sorting the plurality ofprimitives horizontally prior to determining the left edge of each ofthe plurality of primitives.
 21. The method of claim 20 wherein thesorting step (d) further includes the step of: (d1) sorting theplurality of primitives horizontally from left to right, based upon theleft edge of the primitive.
 22. The method of claim 20 furthercomprising the step of: (e) providing the first portion of the pluralityof primitives for a current line to a sorter for performing the sortingstep (d).
 23. The method of claim 22 wherein each of the plurality ofprimitives has a top and a bottom, wherein the plurality of primitivesare sorter based on the top of each of the plurality of primitives andwherein the first portion providing step (e) further includes the stepsof: (e1) determining whether the top of at least one new primitive ofthe plurality of primitives is not lower than a current line; (e2)merging data for the at least one new primitive if the top is not lowerthan the current line; (e3) eliminating an expired primitive andoutputting at least a portion of data for a remaining portion of theprimitives after the expired primitive has been removed, the expiredprimitive having a bottom that is above the current line, the dataoutput by the distributor controlling loading of the plurality ofprimitives by the at least one processor; (e4) re-inputting to the mergecircuit data for the remaining portion of the plurality of primitives.